MixER: linear interpolation of latent space for entity resolution

نویسندگان

چکیده

Abstract Entity resolution, accurately identifying various representations of the same real-world entities, is a crucial part data integration systems. While existing learning-based models can achieve good performance, are extremely dependent on quantity and quality training data. In this paper, MixER model proposed to alleviate these problems. The utilizes our newly designed augmentation method called EMix. EMix map discrete entity records continuous latent space variables (e.g., probability distributions) then linearly interpolate in generate many augmented samples. matching further optimized based strengthen its generalization capability. achieves significant strengths sensitivity experiments when below 50. robustness experiments, presents an absolute performance advantage label noise exceeds 20%. addition, ablation demonstrate that developed effectively improve ability model. overall experimental results prove exhibited excellent over current state-of-the-art methods.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A Latent Dirichlet Model for Unsupervised Entity Resolution

Entity resolution has received considerable attention in recent years. Given many references to underlying entities, the goal is to predict which references correspond to the same entity. We show how to extend the Latent Dirichlet Allocation model for this task and propose a probabilistic model for collective entity resolution for relational domains where references are connected to each other....

متن کامل

A Latent Dirichlet Allocation Model for Entity Resolution

In this paper, we address the problem of entity resolution, where given many references to underlying objects, the task is to predict which references correspond to the same object. We propose a probabilistic model for collective entity resolution. Our approach differs from other recently proposed entity resolution approaches in that it is a) unsupervised, b) generative and c) introduces a hidd...

متن کامل

The Effect of Transitive Closure on the Calibration of Logistic Regression for Entity Resolution

This paper describes a series of experiments in using logistic regression machine learning as a method for entity resolution. From these experiments the authors concluded that when a supervised ML algorithm is trained to classify a pair of entity references as linked or not linked pair, the evaluation of the model’s performance should take into account the transitive closure of its pairwise lin...

متن کامل

Information Space Models for Data Integration, and Entity Resolution

Geospatial information systems provide a unique frame of reference to bring together a large and diverse set of data from a variety of sources. However, automating this process remains a challenge since: 1) data (particularly from sensors) is error prone and ambiguous, 2) analysis and visualization tools typically expect clean (or exact) data, and 3) it is difficult to describe how different da...

متن کامل

Warped distance for space-variant linear image interpolation

The problem of image interpolation using linear techniques is dealt with in this paper. Conventional space-invariant methods are revisited and changed into space-variant ones, by introducing the concept of the warped distance among the pixels of an image. A better perceptual rendition of the image details is obtained in this way; this effect is proved both via the evaluation of the response to ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: Complex & Intelligent Systems

سال: 2023

ISSN: ['2198-6053', '2199-4536']

DOI: https://doi.org/10.1007/s40747-023-01018-2